Dyr og Data

Statistical thinking — models with categorical predictors

Gavin Simpson

Aarhus University

Mona Larsen

Aarhus University

2024-10-09

Introduction

We ended the last class starting to think about models with categorical predictors

We’ll recap those last few slides on ANOVA

Then look at categorical predictors more generally, including the concepts of interactions

Analysis of Variance

Analysis of Variance

Analysis of Variance (ANOVA) is the classic name for a Gaussian linear model where the predictor (explanatory) variables are categorical

Earlier ANOVA table used to partition variance in \(y\) into components explained by \(x_j\) & a residual component not explained by the regression model

A slightly more restricted view of ANOVA is that it is a technique for partitioning the variation in \(y\) into that explained by one or more categorical predictor variables

The categories of each factor are the groups or experimental treatments

Analysis of Variance

ANOVA considers the different sources of variation that might arise on a data set

Of particular interest is on the differences in the mean value of \(y\) between groups

We can think of within-group and between-group variances

  • Between-group variance is that due to the treatment (group) effects
  • Within-group variance is that due to the variability of individuals & measurement error

There Will always be variation between individuals but is this within-group variance large or small, relative to the variance between groups?

ANOVA how many ways?

One of the complications surrounding ANOVA is the convoluted nomenclature used describe variants of the method

Variants commonly distinguished by the number of categorical variables in the model

  • contains a single categorical variable
  • contains two categorical variables
  • contains three categorical variables

Two-way and higher ANOVA potentially involve the consideration of factor—factor interactions

One-way ANOVA

In a we have a single categorical variable \(x\) with two or more levels With two levels we have the same analysis as the t test

If we consider differences between animals of different breed, we might use an breed factor whose levels might be

  • Danish Holstein,
  • Red Danish
  • Jersey

If we’re testing the effect of parity, the factor might be parity with levels: 1, 2, 3, & 4+

One-way ANOVA

Assume we have a single categorical variable \(x\) with three levels. The One-way ANOVA model using dummy (or indicator) coding is

\[y_i = \beta_0 + \beta_1D_{i1} + \beta_2D_{i2} + \varepsilon_i\]

Where \(D_{ij}\) is the coding for the \(j\)th level (group) for the \(i\)th observation

Group Intercept \(D_1\) \(D_2\)
1 1 0 0
2 1 1 0
3 1 0 1

One-way ANOVA

Measure the between-group variance as the regression sums of squares

The within-group variance is the residual sums of squares

An omnibus test F statistic is used to test the null hypothesis of no differences among population group means

\[\mathsf{H_0:} \; \beta_1 = \beta_2 = 0\]

One-way ANOVA

Low & colleagues (2016, PLOS One) conducted a study to examine the effects of two different anesthetics on aspects of mouse physiology

  • 12 mice anesthetised with isoflurane
  • 11 mice anesthetised with alpha chloralose

Blood CO2 levels were recorded after 120 minutes as the response variable

One-way ANOVA

mice <- read_csv("data/mice-anesthesia/mice-anesthesia.csv") |>
  mutate(
    anesth = fct_recode(
      anesth,
      "Isoflurane" = "iso",
      "Alpha chloralose" = "ac")) |>
  rename(anesthetic = anesth)

mice
# A tibble: 23 × 2
   anesthetic   co2
   <fct>      <dbl>
 1 Isoflurane    43
 2 Isoflurane    35
 3 Isoflurane    50
 4 Isoflurane    39
 5 Isoflurane    56
 6 Isoflurane    54
 7 Isoflurane    39
 8 Isoflurane    51
 9 Isoflurane    49
10 Isoflurane    54
# ℹ 13 more rows

One-way ANOVA

mice_labs <- labs(
  x = "Anesthetic",
  y = expression(CO[2])
)
mice |>
  ggplot(aes(y = co2, x = anesthetic)) +
  geom_boxplot() +
  geom_point(
    position = position_jitter(width = 0.1),
    aes(colour = anesthetic)) +
  guides(colour = "none") +
  mice_labs

One-way ANOVA

Results of fitting one-way ANOVA to the anesthesia data

mice_m <- lm(co2 ~ anesthetic, data = mice)
anova(mice_m)
Analysis of Variance Table

Response: co2
           Df Sum Sq Mean Sq F value   Pr(>F)   
anesthetic  1 2509.1 2509.09  9.5647 0.005515 **
Residuals  21 5508.9  262.33                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This is the decomposition of the variance in the data into that explained by anesthetic and the unexplained (residual) variance

One-way ANOVA

This is just a Gaussian linear model

summary(mice_m)

Call:
lm(formula = co2 ~ anesthetic, data = mice)

Residuals:
   Min     1Q Median     3Q    Max 
-20.91 -11.00   0.00   5.00  44.09 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)            70.909      4.883  14.520 2.01e-12 ***
anestheticIsoflurane  -20.909      6.761  -3.093  0.00552 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 16.2 on 21 degrees of freedom
Multiple R-squared:  0.3129,    Adjusted R-squared:  0.2802 
F-statistic: 9.565 on 1 and 21 DF,  p-value: 0.005515

Contrasts

With the default contrasts (treatment contrasts) and an intercept we have:

coef(summary(mice_m))
                      Estimate Std. Error  t value     Pr(>|t|)
(Intercept)           70.90909   4.883451 14.52028 2.010337e-12
anestheticIsoflurane -20.90909   6.760831 -3.09268 5.515219e-03
  • (Intercept) — mean CO2 for the reference level (Alpha chloralose)
  • anestheticIsofluranedifference between mean CO2 for Alpha chloralose samples and Isoflurane samples
mice |>
  group_by(anesthetic) |>
  summarise(mean = mean(co2))
# A tibble: 2 × 2
  anesthetic        mean
  <fct>            <dbl>
1 Alpha chloralose  70.9
2 Isoflurane        50  

One-way ANOVA

Comparisons

mice_m |>
  avg_predictions(by = "anesthetic", hypothesis = "pairwise",
    df = df.residual(mice_m))

 Estimate Std. Error     t Pr(>|t|)   S 2.5 % 97.5 % Df
    -20.9       6.76 -3.09  0.00552 7.5   -35  -6.85 21

Term: Isoflurane - Alpha chloralose
Type:  response 
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, df 

One-way ANOVA

When factors contain more than two levels, we typically won’t be able to read off the differences of interest from the summary() output

Consider the small milk yield data

milk <- read_csv("data/itve-milk/milk-yield.csv") |>
  mutate(breed = factor(
    breed, levels = c(2,8),
      labels = c("Holsteins", "Cross-breed")),
    parity = factor(parity, levels = 1:4)) |>
  rename(
    cow_id = cowid,
    milk_jan_kg = kgmilk1,
    milk_feb_kg = kgmilk2
  )
milk

One-way ANOVA

# A tibble: 111 × 5
   breed     parity cow_id milk_jan_kg milk_feb_kg
   <fct>     <fct>   <dbl>       <dbl>       <dbl>
 1 Holsteins 3         111        16          16.1
 2 Holsteins 4         888        27.4        23  
 3 Holsteins 4         907        17.8        21.7
 4 Holsteins 4         932        21.9        25  
 5 Holsteins 4         982        18.9        19.6
 6 Holsteins 4        1056        18.7        21.7
 7 Holsteins 4        1068         9.1        13.6
 8 Holsteins 4        1089        19.7        22.4
 9 Holsteins 4        1091        11.8        20.2
10 Holsteins 4        1104        29.3        34.6
# ℹ 101 more rows

Milk yield model

milk_m <- lm(milk_jan_kg ~ parity, data = milk)
summary(milk_m)

Call:
lm(formula = milk_jan_kg ~ parity, data = milk)

Residuals:
     Min       1Q   Median       3Q      Max 
-17.7800  -3.1692   0.5222   3.9211  19.3091 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   18.561      1.029  18.046  < 2e-16 ***
parity2        3.417      1.424   2.400  0.01813 *  
parity3        3.319      1.674   1.983  0.04998 *  
parity4        5.230      1.626   3.216  0.00172 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.908 on 107 degrees of freedom
Multiple R-squared:  0.09834,   Adjusted R-squared:  0.07306 
F-statistic:  3.89 on 3 and 107 DF,  p-value: 0.01106

Pairwise comparisons

Comparisons

milk_m |>
  avg_predictions(by = "parity", hypothesis = "pairwise")

  Term Estimate Std. Error       z Pr(>|z|)   S   2.5 % 97.5 %
 3 - 4  -1.9109       1.83 -1.0468   0.2952 1.8 -5.4888   1.67
 3 - 2  -0.0978       1.65 -0.0593   0.9527 0.1 -3.3274   3.13
 3 - 1   3.3194       1.67  1.9825   0.0474 4.4  0.0378   6.60
 4 - 2   1.8131       1.60  1.1340   0.2568 2.0 -1.3207   4.95
 4 - 1   5.2303       1.63  3.2162   0.0013 9.6  2.0429   8.42
 2 - 1   3.4172       1.42  2.3998   0.0164 5.9  0.6263   6.21

Type:  response 
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 

Comparisons

Comparisons are differences of model predictions

Milk model is

\[\begin{align*} \mathtt{milk\_jan\_kg}_i & \sim \text{Normal}(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \beta_1 D_{1i} + \beta_2 D_{2i} + \beta_3 D_{3i} \end{align*}\]

coef(milk_m)
(Intercept)     parity2     parity3     parity4 
  18.560606    3.417172    3.319394    5.230303 

Comparisons

\[\begin{align*} \mathtt{milk\_jan\_kg}_i & \sim \text{Normal}(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \beta_1 D_{1i} + \beta_2 D_{2i} + \beta_3 D_{3i} \end{align*}\]

coef(milk_m)
(Intercept)     parity2     parity3     parity4 
  18.560606    3.417172    3.319394    5.230303 
  • \(\beta_0\) is (Intercept)
  • \(\beta_1\) is parity2
  • \(\beta_2\) is parity3
  • \(\beta_3\) is parity4

Comparisons

coef(milk_m)
(Intercept)     parity2     parity3     parity4 
  18.560606    3.417172    3.319394    5.230303 
  • Parity 1 \(\widehat{\mathtt{milk\_jan\_kg}} = \beta_0\)
  • Parity 4 \(\widehat{\mathtt{milk\_jan\_kg}} = \beta_0 + \beta_3\)

Comparison between parties 3 and 4?

\[\begin{align*} C_{\mathtt{3} - \mathtt{4}} & = \widehat{\mathtt{milk\_jan\_kg}}_{\mathtt{parity} = 3} - \widehat{\mathtt{milk\_jan\_kg}}_{\mathtt{parity} = 4} \\ & = (\beta_0 + \beta_2) - (\beta_0 + \beta_3) \\ & = \beta_2 - \beta_3 \\ & = 3.319394 - 5.230303 \\ & \approx -1.911 \end{align*}\]

Interactions

Interactions

Up to now we’ve only considered the main effects of the variables in our model

There, terms are additive, each variable contributing an amount to the model irrespective of the values of the other predictors

But what if the effect of one variable depends upon the value of one or more other variables?

This is where interaction terms come in

Interaction terms

Two explanatory variables interact when the partial effect of one depends on the value of the other

Marginal effect — effect of \(x_1\) on \(y\) ignoring the effects on \(y\) of the other variables \(x_j\)

Partial effect — effect of \(x_1\) on \(y\) accounting for the effects on \(y\) of the other variables \(x_j\)

Interactions occur in several types

  • continuous – factor interactions
  • continuous – continuous interactions
  • factor – factor interactions

Continuous–factor

We’ve been plotting continuous–factor interactions quite a lot

Continuous–factor

Two models

\[\begin{align*} \mathtt{bill\_depth\_mm}_i & \sim \text{Normal}(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \mathtt{bill\_length\_mm}_i + \mathtt{species}_i \end{align*}\]

\[\begin{align*} \mathtt{bill\_depth\_mm}_i & \sim \text{Normal}(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \mathtt{bill\_length\_mm}_i + \mathtt{species}_i \\ & + (\mathtt{bill\_length\_mm}_i \times \mathtt{species}_i) \end{align*}\]

bill_m1 <- lm(bill_depth_mm ~ bill_length_mm + species, data = penguins)
bill_m2 <- lm(bill_depth_mm ~ bill_length_mm * species, data = penguins)

Continuous–factor

bill_m1 |>
  plot_predictions(
    by = c("bill_length_mm", "species")) 

bill_m2 |>
  plot_predictions(
    by = c("bill_length_mm", "species"))

Continuous–factor

Interactions between factors can be represented as dummy variables, indicating group/combination membership

Reference level (Adelie) absorbed into the intercept term

Model is parameterised in terms of differences of means from the reference level

head(model.matrix(bill_m1), n = 3)
  (Intercept) bill_length_mm speciesChinstrap speciesGentoo
1           1           39.1                0             0
2           1           39.5                0             0
3           1           40.3                0             0
tail(model.matrix(bill_m2), n = 3)
    (Intercept) bill_length_mm speciesChinstrap speciesGentoo
342           1           49.6                1             0
343           1           50.8                1             0
344           1           50.2                1             0
    bill_length_mm:speciesChinstrap bill_length_mm:speciesGentoo
342                            49.6                            0
343                            50.8                            0
344                            50.2                            0

Continuous–factor

Alternative parameterisation

bill_depth_mm ~ bill_length_mm * species

An overall effect of bill_length_mm averaged over the three species plus species-specific differences in effect of bill_length_mm

bill_depth_mm ~ species / bill_length_mm

Directly estimates the effect of bill_length_mm within each species

bill_m3 <- lm(bill_depth_mm ~ species / bill_length_mm, data = penguins)
bill_m4 <- lm(bill_depth_mm ~ species / bill_length_mm - 1, data = penguins)

Interaction?

summary(bill_m4)

Call:
lm(formula = bill_depth_mm ~ species/bill_length_mm - 1, data = penguins)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.6574 -0.6675 -0.0524  0.5383  3.5032 

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
speciesAdelie                   11.40912    1.13812  10.025  < 2e-16 ***
speciesChinstrap                 7.56914    1.70983   4.427 1.29e-05 ***
speciesGentoo                    5.25101    1.33528   3.933 0.000102 ***
speciesAdelie:bill_length_mm     0.17883    0.02927   6.110 2.76e-09 ***
speciesChinstrap:bill_length_mm  0.22221    0.03493   6.361 6.55e-10 ***
speciesGentoo:bill_length_mm     0.20484    0.02805   7.303 2.06e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9548 on 336 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.997, Adjusted R-squared:  0.9969 
F-statistic: 1.858e+04 on 6 and 336 DF,  p-value: < 2.2e-16
anova(bill_m1, bill_m2)
Analysis of Variance Table

Model 1: bill_depth_mm ~ bill_length_mm + species
Model 2: bill_depth_mm ~ bill_length_mm * species
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1    338 307.20                           
2    336 306.32  2   0.87243 0.4785 0.6202

No, these effects look small

Continuous–factor

scc <- read_csv("data/itve-milk-scc/milk-scc.csv") |>
  rename(
    cow_id = cowid,
    milk_yield_kg = kgmilk,
    somatic_cell_count = cellcount) |>
  mutate(
    breed = factor(
      breed,
      levels = c(1,2,3,8),
      labels = c("Red Danish", "Holstein", "Jersey", "Cross-breed")),
    parity = factor(parity)
  ) |>
  tidyr::drop_na()

Continuous–factor

scc_labs <- labs(
  x = "Milk yield (kg)",
  y = expression("Somatic cell count (1000 cells" ~ ( ml^{-1} )))
p1 <- scc |>
  ggplot(aes(
    x = milk_yield_kg,
    y = somatic_cell_count,
    colour = breed)) +
    geom_point() +
    geom_smooth(method = "lm", se = FALSE, linewidth = 1) +
    scc_labs

p2 <- scc |>
  ggplot(aes(
    x = milk_yield_kg,
    y = somatic_cell_count,
    colour = parity)) +
    geom_point() +
    geom_smooth(method = "lm", se = FALSE, linewidth = 1) +
    scc_labs

p1 + p2

Continuous–factor

Continuous–factor

scc_m1 <- lm(somatic_cell_count ~ milk_yield_kg + parity, data = scc)
scc_m2 <- lm(somatic_cell_count ~ milk_yield_kg * parity, data = scc)
anova(scc_m2)
Analysis of Variance Table

Response: somatic_cell_count
                       Df    Sum Sq Mean Sq F value    Pr(>F)    
milk_yield_kg           1   1368328 1368328  2.5764    0.1087    
parity                  7  21771120 3110160  5.8561 1.078e-06 ***
milk_yield_kg:parity    7   3678319  525474  0.9894    0.4373    
Residuals            1123 596425913  531101                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Continuous–factor

scc_m2 |>
 plot_predictions(by = c("milk_yield_kg", "parity")) +
 scc_labs

Maybe, but very uncertain effects

Also, the model is wrong, why?

Continuous–continuous

In continuous–continuous interactions we don’t need to worry about coding

This type of interaction is represented by a new variable that is the product of the two variables

If we have \(x_1\) and \(x_2\), the interaction would be be \((x_1 \times x_2)\)

bill_cint1 <- lm(bill_depth_mm ~ bill_length_mm + flipper_length_mm, data = penguins)
bill_cint2 <- lm(bill_depth_mm ~ bill_length_mm * flipper_length_mm, data = penguins)

Continuous–continuous

bill_cint1 <- lm(bill_depth_mm ~ bill_length_mm + flipper_length_mm, data = penguins)
bill_cint2 <- lm(bill_depth_mm ~ bill_length_mm * flipper_length_mm, data = penguins)
bill_cint1 |>
  plot_predictions(
    condition = list("bill_length_mm",
      flipper_length_mm = "fivenum")) +
  labs(title = "Without interaction")

bill_cint2 |>
  plot_predictions(
    condition = list("bill_length_mm",
      flipper_length_mm = "fivenum")) +
  labs(title = "With interaction")

Continuous–continuous

anova(bill_cint1, bill_cint2)
Analysis of Variance Table

Model 1: bill_depth_mm ~ bill_length_mm + flipper_length_mm
Model 2: bill_depth_mm ~ bill_length_mm * flipper_length_mm
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1    339 825.32                           
2    338 820.06  1    5.2617 2.1687 0.1418

Factor–factor

Factor–factor interactions estimate different effects of levels of factor \(f_1\) for each level of factor \(f_2\)

This is where the names two-way ANOVA, three-way ANOVA come from — multiple factors in a model

Pig weight gain

Pigs were feed different amounts (0 and 5mg) of two vitamins and average daily gain of individual pigs was recorded

pig_gain <- read_excel("data/pig-daily-gain/pig-daily-gain.xlsx") |>
  janitor::clean_names() |>
  mutate(
    across(starts_with("vitamin"), .fns = factor)
  )
pig_gain

Pig weight gain

# A tibble: 20 × 3
    gain vitamin_1 vitamin_2
   <dbl> <fct>     <fct>    
 1 0.585 0         0        
 2 0.536 0         0        
 3 0.458 0         0        
 4 0.486 0         0        
 5 0.536 0         0        
 6 0.567 0         5        
 7 0.545 0         5        
 8 0.589 0         5        
 9 0.536 0         5        
10 0.549 0         5        
11 0.473 5         0        
12 0.45  5         0        
13 0.869 5         0        
14 0.473 5         0        
15 0.464 5         0        
16 0.684 5         5        
17 0.702 5         5        
18 0.9   5         5        
19 0.698 5         5        
20 0.693 5         5        

Pig weight gain

Models with main effects only and with main effects plus interaction

gain_m1 <- lm(gain ~ vitamin_1 + vitamin_2, data = pig_gain)

gain_m2 <- lm(gain ~ vitamin_1 * vitamin_2, data = pig_gain)

Pig weight gain?

anova(gain_m1, gain_m2)
Analysis of Variance Table

Model 1: gain ~ vitamin_1 + vitamin_2
Model 2: gain ~ vitamin_1 * vitamin_2
  Res.Df     RSS Df Sum of Sq     F Pr(>F)
1     17 0.20559                          
2     16 0.17648  1  0.029108 2.639 0.1238

Pig weight gain?

gain_m1 |>
  plot_predictions(
    by = c("vitamin_1", "vitamin_2"))

gain_m2 |>
  plot_predictions(
    by = c("vitamin_1", "vitamin_2"))

Is there an effect?

Pig weight gain

The combined effect is small overall but most visible when vitamin_1 = 5mg

gain_m2 |>
  avg_predictions(
    by = c("vitamin_1", "vitamin_2"),
    hypothesis = ~ sequential | vitamin_1)

                      Hypothesis vitamin_1 Estimate Std. Error     z Pr(>|z|)
 (vitamin_2[5]) - (vitamin_2[0])         0    0.037     0.0664 0.557  0.57751
 (vitamin_2[5]) - (vitamin_2[0])         5    0.190     0.0664 2.854  0.00431
   S   2.5 % 97.5 %
 0.8 -0.0932  0.167
 7.9  0.0594  0.320

Columns: hypothesis, vitamin_1, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
Type:  response 

Pig weight gain

gain_m2 |>
  avg_predictions(variables = c("vitamin_1", "vitamin_2"))

 vitamin_1 vitamin_2 Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %
         0         0    0.520      0.047 11.1   <0.001  92.3 0.428  0.612
         0         5    0.557      0.047 11.9   <0.001 105.4 0.465  0.649
         5         0    0.546      0.047 11.6   <0.001 101.3 0.454  0.638
         5         5    0.735      0.047 15.7   <0.001 181.1 0.643  0.827

Columns: vitamin_1, vitamin_2, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
Type:  response 

Expected average weight gain for each combination of vitamin_1 and vitamin_2

Each row in the output has a name

  • b1 for row 1,
  • b2 for row 2, etc

Pig weight gain

This allows you to specify any hypothesis test you want

For example; what is the difference in average weight gain between pigs given 0 and 5mg of vitamin_2 when vitamin_1 = 0mg?

gain_m2 |>
  avg_predictions(
    variables = c("vitamin_1", "vitamin_2"),
    hypothesis = "b2 = b1") # test hypothesis of no diff!

  Term Estimate Std. Error     z Pr(>|z|)   S   2.5 % 97.5 %
 b2=b1    0.037     0.0664 0.557    0.578 0.8 -0.0932  0.167

Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
Type:  response 

Fail to reject \(\text{H}_0\)

Pig weight gain

This allows you to specify any hypothesis test you want

For example; what is the difference in average weight gain between pigs given 0 and 5mg of vitamin_2 when vitamin_1 = 5mg?

gain_m2 |>
  avg_predictions(
    variables = c("vitamin_1", "vitamin_2"),
    hypothesis = "b4 = b3") # test hypothesis of no diff!

  Term Estimate Std. Error    z Pr(>|z|)   S  2.5 % 97.5 %
 b4=b3     0.19     0.0664 2.85  0.00431 7.9 0.0594   0.32

Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
Type:  response 

Reject \(\text{H}_0\) at 95% confidence level

Pig weight gain

Or you can test all pairwise comparisons

gain_m2 |>
  avg_predictions(
    variables = c("vitamin_1", "vitamin_2"),
    hypothesis = "pairwise") # test all pairwise comparisons

        Term Estimate Std. Error      z Pr(>|z|)   S  2.5 %  97.5 %
 0, 0 - 0, 5  -0.0370     0.0664 -0.557  0.57751 0.8 -0.167  0.0932
 0, 0 - 5, 0  -0.0256     0.0664 -0.385  0.69994 0.5 -0.156  0.1046
 0, 0 - 5, 5  -0.2152     0.0664 -3.240  0.00120 9.7 -0.345 -0.0850
 0, 5 - 5, 0   0.0114     0.0664  0.172  0.86373 0.2 -0.119  0.1416
 0, 5 - 5, 5  -0.1782     0.0664 -2.683  0.00730 7.1 -0.308 -0.0480
 5, 0 - 5, 5  -0.1896     0.0664 -2.854  0.00431 7.9 -0.320 -0.0594

Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
Type:  response 

Pig weight gain

Or you can test all pairwise comparisons

gain_m2 |>
  avg_predictions(
    variables = c("vitamin_1", "vitamin_2"),
    hypothesis = "reference") # test groups against 0,0 "control"

        Term Estimate Std. Error     z Pr(>|z|)   S   2.5 % 97.5 %
 0, 5 - 0, 0   0.0370     0.0664 0.557   0.5775 0.8 -0.0932  0.167
 5, 0 - 0, 0   0.0256     0.0664 0.385   0.6999 0.5 -0.1046  0.156
 5, 5 - 0, 0   0.2152     0.0664 3.240   0.0012 9.7  0.0850  0.345

Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
Type:  response